Risk-Sensitive Optimal Control of Quantum Systems 
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The importance of feedback control is being increasingly appreciated in quantum physics and 
applications. This paper describes the use of optimal control methods in the design of quantum 
feedback control systems, and in particular the paper formulates and solves a risk-sensitive optimal 
control problem. The resulting risk-sensitive optimal control is given in terms of a new unnormal- 
ized conditional state, whose dynamics include the cost function used to specify the performance 
objective. The risk-sensitive conditional dynamic equation describes the evolution of our knowledge 
of the quantum system tempered by our purpose for the controlled quantum system. Robustness 
properties of risk-sensitive controllers are discussed, and an example is provided. 
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I. INTRODUCTION 

Optimal control theory provides a systematic 
approach to control system design that is widely 
used. A cost function is formulated by the de- 
signer that encodes the desired performance of the 
system as its minimum, and then the cost is mini- 
mized to obtain the desired controller. Perhaps the 
most famous example is Kalman's linear quadratic 
Gaussian (LQG) regulator problem, where the cost 
criterion is an average of an integral, 



:E 

fc=0 



jLQG = E[ V {:x't^PXk + UfeQufe + x'i^jPmxm)] 



(1) 

where Xk and Uk are respectively state and control 
variables (vectors), and P, Q and Pm are weight- 
ing matrices The cost criterion ^ is an ex- 
ample of what is sometimes called a risk-neutral 
criterion. The state (or phase space) variable Xk 
is part of the model of the classical physical sys- 
tem being controlled. In general, the controller 
has only partial access to state information, with 
measurements corrupted by noise. Kalman's opti- 
mal LQG feedback controller is an explicit function 
of the conditional state and covariance. It is dy- 
namic, since the conditional state and covariance 
evolve in time via the Kalman Filter (see, e.g., 
Qi and the Kalman filter does not involve 

the cost function in any way; it gives the optimal 
mean square state estimate independently of any 
control objective. Interestingly, the function giv- 
ing the optimal feedback control is the same as for 
an analogous problem with full state information, 
viz. multiplication by a gain matrix determined 
by solving a Riccati equation. Kalman's optimal 



LQG controller is the paradigm example of the so- 
called separation structure, where the controller is 
decomposed into an estimation part (filtering) and 
a control part, as illustrated in Figure ^ 
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FIG. 1: Feedback controller showing the separation 
structure. 



Over the past 20 or so years, another type of op- 
timal control problem has generated considerable 
interest, viz. linear exponential quadratic Gaussian 
(LEQG) optimal control, or risk-sensitive optimal 
control, |a|2a|2a. In this average of exponential 
of integral problem, the cost is of the form 



jLEQG _ 

E[cxp {^1^SQ^{x'i,Pxk + u'^Quk) + x'j^[PmXm 

■(2) 

In this case the optimal feedback control is an ex- 
plicit function of a dynamical quantity closely re- 
lated to the conditional state and covariance, but 
given by dynamics that include terms from the cost 
function. It also has a separation structure, though 
in this case the filter depends on the cost function 
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used to specify the performance objective, and is 
a modification of the Kahiian Filter. One of the 
major reasons for the interest in the risk-sensitive 
problem is its close connections to robust control 
and minimax games, [isl l2l|. Robust control 
concerns the desire to design controllers that arc 
robust with respect to uncertainty, such as model 
errors and exogenous disturbances, |T9| . Robust- 
ness properties of risk-sensitive controllers are de- 
scribed in 0. 

Risk-neutral, risk-sensitive and other stochastic 
control problems have been considered for prob- 
lems with a finite number of states, see, e.g. 
0, H, 0, H^]. After an analysis of an example 
of a machine replacement problem Q , the authors 
concluded that for that problem the risk-neutral 
controller was more aggressive than risk-sensitive 
and related minimax controllers. 

Suppose we wish to control a quantum phys- 
ical system using real time feedback via a non- 
quantum feedback system (say using a digital com- 
puter) in some optimal fashion. If one were to do 
this using a standard cost criterion, say one analo- 
gous to Kalman's (LQG) regulator problem (risk- 
neutral), then one would find that the optimal con- 
trol is a function of the conditional (selective) state 
(a density operator), as is well known, see, e.g. 
m O ■ The conditional state is the solu- 
tion of a stochastic master equation that describes 
the evolution of our knowledge of the system. This 
stochastic master equation is used in two ways: (i) 
as the model of the quantum physical system, tak- 
ing into account the effect of the measurements, 
and (ii) as the dynamics of the filter in the opti- 
mal controller. Figure ^ 

The purpose of this paper is to consider the 
risk- sensitive optimal control of quantum physi- 
cal systems. The quantum systems are modelled 
by stochastic master equations for the conditional 
state. The risk-sensitive criterion is one of a class 
of multiplicative cost functions. The optimal so- 
lution for this class of problems has a separation 
structure. Figure ^ where the filter describes the 
evolution of an unnormalized conditional state via 
a modified stochastic master equation that con- 
tains the cost function used to specify the perfor- 
mance objective. The optimal control is a func- 
tion of this unnormalized conditional state. It is 
important to note that, in contrast to the risk- 
neutral case described above, the states and dy- 
namics for the quantum physical model and the 
filter are not the same. Indeed, the unnormalized 
conditional dynamic equation used in the filter de- 
scribes the evolution of our knowledge of the quan- 
tum system tempered by our purpose for the con- 
trolled quantum system. This type of extension 
of the conditional dynamics appears to be new to 



quantum physics, and may merit further investi- 
gation. We emphasize that the unnormalized con- 
ditional state is defined only in the context of the 
risk-sensitive and multiplicative control objectives 
considered here, where it is used in a specific feed- 
back situation. Again, wc emphasize that (i) the 
model of the quantum physical system is the stan- 
dard stochastic master equation for the conditional 
state, and (ii) the filter is described by a modi- 
fied stochastic master equation for an unnormal- 
ized conditional state; this modified equation in- 
cludes terms from the cost function. 

This paper is organized as follows. In section 
|n]we carefully describe the model we use for the 
controlled quantum system. Then in section IIIII 
we summarize some relevant results for a risk- 
neutral optimal control problem, and make some 
comments on the feedback solution. Section IIVI 
contains the formulation and dynamic program- 
ming solution to the risk-sensitive and related mul- 
tiplicative cost optimal control problems, together 
with a brief discussion of robustness. The ideas 
are illustrated by a simple example of a two-state 
system with feedback. Further developments, ap- 
plications and examples will be given in subsequent 
papers. 

Acknowledgement. The author wishes to thank 
Professor I.R. Petersen for many valuable discus- 
sions. 



II. THE CONTROLLED QUANTUM 
SYSTEM 

A. Controlled State Transfer 

We consider a controlled quantum physical sys- 
tem with inputs u and outputs y. The inputs rep- 
resent signals or actions that are applied to the 
system, such as voltages, forces, or light pulses. 
The outputs arc signals that result from repeated 
measurements of observable quantities, such as po- 
sition, spin, etc. We will assume, for simplicity, 
that the measurements are discrete valued. It is 
sometimes useful to denote the range of input and 
output values by U and Y respectively. 

The state of the quantum system is described 
by a density operator a;|29j. This state evolves in 
time as a result of a variety of factors including the 
underlying unitary evolution, interaction with the 
environment, the effect of repeated measurements, 
and feedback control actions. Since measurements 
are made, and the outcomes are used to determine 
control actions in a feedback context, we are inter- 
ested in the selective or conditional evolution of the 
states. As an example 0iISI13j a- range of condi- 
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tional evolutions can be described by an Ito-type 
stochastic master equations (SME) of the form 

duo ^ C[uj]dt + M[uj]dW (3) 

for suitable (super) operators C and M (which 
may depend on the control u). Here, dW repre- 
sents an Ito-type Brownian motion (Wiener pro- 
cess) increment, called an innovation, related to 
the measured output value y by 

dy = tr{N[oj]}dt + dW (4) 

for a suitable (super) operator N. If we denote 
by p the expected value of uj with respect to W 
(or y), we obtain the master equation^ frequently 
encountered in the analysis of open systems: 

P = C[p]. (5) 

It is conceptually and technically simpler to 
work in discrete time, and so we will do so in 
this paper. Effectively, we will be using a model 
for sampled-data feedback control of quantum sys- 
tems. In this model, measurements are made and 
control actions are applied at discrete time instants 
ti.|3nl|. called sample times. Continuous time mod- 
els are of considerable importance, and will be con- 
sidered elsewhere. 

The discrete time model we use for the quantum 
system is defined in terms of a (super) operator 
T{u, y) |3l| that depends on the control input u 
and the output measurement y. The idea is that 
if the quantum system is in state tOk at time k, 
and at this time the control value Uk is applied, a 
measurement outcome yk+i will be recorded, and 
the system will transfer to a new state ujk+i- The 
probability of yk+i is p{yk+i\uk,ujk). where 

p{y\u,oj) ^ {T{u,y)u;,I). (6) 

Here, we have used the notation 

{lu, B) = tT[Buj] (7) 

to specify the (expected) value of an observable B 
when the system is in state ui. The operator T{u, y) 
is assumed to be normalized, i.e. 

so that p{y\u,uj) is a probability distribution, since 

it satisfies J^yPivl'^i'^) — 1- 

Selective or conditional evolution means that the 
new state tOk+i depends on the value of the mea- 
surement yk+i, and we write this dependance as 
follows: 

Uk+i = Ar{uk,yk+i)t^k, (8) 



where 

. , N r(u,u)a; 
Ar{u,y)uj^ / I V 9 
p{y\u,uj) 

Equation ||SJ) is a discrete time stochastic master 
equation (SME), and can be viewed, e.g., as the 
result of integrating an equation of the form |j3J 
over one time step (after substituting for dW in 
terms of dy). 

We denote the average of the conditional state 
LOk with respect to the measurements by pk- If Uk 
is a deterministic (non-random) input signal, then 
Pk satisfies the master equation 

Pk+i = ^T{uk,y)pk- (10) 

Equation constitutes our model of the quan- 
tum system. For further information on this frame- 
work of operator valued measures and quantum 
operations, see 0, |^ 0, We now give some 
examples. 

Example II. 1 We define the controlled transfer 
r(w, y) by interleaving open system dynamics and 
imperfect orthogonal measurements. The open 
system dynamics are modelled by a quantum op- 
eration 

6 

where the controlled operators satisfy 

J2b^b''^b = ^ foi' inputs u. Closed systems 
are described by the unitary evolution operation 
S^LO = T'^LoT^^ , where for each input value u, T" 
is a unitary operator. 

The imperfect measurements are modelled as 
follows. Let A be a self-adjoint operator with 
discrete nondegenerate spectrum spec(A). For 
a S spec (A) an eigenvalue of A let \a) denote 
the normalized eigenvector, and let Pa = \a){a\ 
denote the projection onto the eigenspace of A 
(PalV') ~ («lV')|a))- Perfect measurements would 
correspond to y = a; however, to reflect the pres- 
ence of measurement noise in applications we will 
assume that when a measurement occurs on the 
quantum system, the values a and associated pro- 
jections occur in the usual (perfect) way, but that 
knowledge of the outcomes is corrupted by sensor 
noise so that the controller (or any observing de- 
vice or person) measures a value y. The measure- 
ment y is a random variable, related to the out- 
comes a via probability kernels q{y\a), the proba- 
bility of y given that a occurred. The kernels have 
the property that J^ylivW) — 1 ^^r all a. In the 
case of perfect measurements, q{y\a) — 1 if y = a, 
and q{y\a) = if y 7^ a. 
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The operator T{u,y) is given by 

T{u, y)Lu = qiy\a)PaEi:u;EfPa (12) 

a, 6 

and the adjoint is given by 

r\u,y)B ^Y.1^y\'')K^ P-BP-^b (13) 

a.h 

where B is an observable. The expressions in this 
example can be derived using standard techniques 
of quantum operations and discrete time filtering 
based on Bayes' Rule (see, e.g., 0, Chapter 2.2], 
m Chapter 8], [3, ||, H Chapter 6], Q, Chapter 
7]). □ 

Example II. 2 (Two-state system.) We now de- 
scribe a specific instance of Example III. II viz. a 
two-state system and measurement device, where 
it is desired to use feedback control to put the sys- 
tem into a given state. The example is inspired by 
a simple quantum feedback example |2^ Section 
1.3] and an example in stochastic control concern- 
ing a machine replacement problem Il6l| . 

In |2^ Section 1.3], a particle beam is passed 
through a Stern-Gerlach device, which results in 
one beam of particles in the up state, and one 
beam in the down state. The beam of particles 
in the up state is subsequently left alone, while 
the beam in the down state is subject to a further 
device which will result in a change of spin direc- 
tion from down to up. The final outcome of this 
feedback arrangement is that all particles are in 
the up state. Analogous feedback configurations 
can be constructed using other physical systems, 
e.g. light and polarization measurement. 

In what follows we extend the general features of 
this example to accommodate repeated noisy mea- 
surements. Physically, the noisy measurements 
might arise from imperfectly separated beams, 
where a proportion of each beam contaminates the 
other, and/or from interference or noise affecting 
sensors. The example was chosen because the risk- 
neutral and risk-sensitive problems can be solved 
explicitly. Hence the example provides a con- 
crete illustration of some ideas concerning quan- 
tum feedback control. More substantial examples 
and applications will be considered elsewhere. 

The pure states of the system are of the form 

iV')-c_ii-i)+ciii) ^ (^''-^y 

The states | — 1) and |1) are eigcnstates of the 
observable 

A-{-„^\) (14, 



corresponding to ideal measurement values a = — 1 
and a = 1. It is desired to put the system into the 
state 









M-a 














Uk 




,Vk+l 





FIG. 2: Two-state system example showing the con- 
trolled unitary operator T" and the noisy measurement 
device M-q with error probability a. 

We define a controlled transfer operator T{u,y) 
as the following physical process. Figure [5] First 
apply a unitary transformation T", where the con- 
trol value u = means do nothing, while u = 1 
means to flip the states (quantum not gate), i.e. 



if u = 
if u = 1. 



We then make an imperfect measurement corre- 
sponding to the observable A. We model this by 
an ideal device (e.g. Stern-Gerlach, beam splitter) 
with projection operators 

= ( ) ' ^1 = ( 1 ) 

followed by a memoryless channel with error prob- 
ability kernels 

q{-l\-l) =l-a 
q{-l\l) =a 
q{l\-l) =a 
q{l\l) =l-a 

where < a < 1 is the probability of a measure- 
ment error (cf. [23l Figure 8.1]). 

The controlled transfer operator is therefore 
(from Ca)) 

T{u,y)uo = q{y\ - l)P_iT"a;T" tp.^ 
-|-g(y|l)Pir"a;T"tp^. 

In this example, the control u can take the values 
or 1, and output y has values or 1 (U = {0, 1}), 
Y = {0,1}). 
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If we write a general density matrix as 



^12 ^22 



(15) 



then the controlled operators T(u, y) are given ex- 
plicitly by 



r(o,-iV 
r(o, i)'^ 

r(i,-iv 
r(i,i)w 



(1 — a)ujii 
au}22 

aujii 
(1 - a)uj22 

(1 - a)uj22 
aujii 

auj22 
(1 — a)ujii 



This example is continued in stages in the remain- 
der of the paper fExamples ITO ITTlTI ITTO HVJ|) . 

□ 



B. Feedback Control 

In the above description of the quantum system 
© , we have not described how the controls Uk are 
determined by the measurements ijk via a feedback 
controller K. We now do this. 

Feedback controllers should be causal, i.e., the 
current control value Uk cannot depend on future 
values of the measurements yk+i, yk+2, ■ ■ ■■ On a 
time interval < k < M — 1 this is expressed as 
follows: 



K = {Ko,K,, 



} 



where 



uo = Ko 

ui = Ki{yi) 

U2 = A'2 (2/1,^2) 
etc. 



input 



physical system 



output 



y 




FIG. 3: Feedback control of quantum system showing 
a general feedback controller K. 



To simplify notation, we often write sequences 
Ufcj , . . . , Ufcj as UkiM- Then we can write 

Uk = Kk{yi,k)- A controller K can be restricted 
to subintervals k < j < M by fixing (or omitting) 
the first arguments in the obvious way. Wc denote 
by K, the class of all such feedback controllers. 

A feedback controller K in closed loop with the 
quantum system, Figure |21 operates as follows. 
The given initial state loq and controller K are suf- 
ficient to define random sequences of states ujo,m, 
inputs U0.A/-1 and outputs ?/i,m over a given time 
interval < k < M itcratively as follows. The 
control value uq is determined by Kq (no obser- 
vations are involved yet), and it is applied to the 
quantum system, which responds by selecting yi at 
random according to the distribution p(yi\uo,uJo)- 
This then determines the next state toi via ©. 
Next ui is given by Ki{yi), and applied to the 
system. This process is repeated until the final 
time. 

The controller K therefore determines controlled 
stochastic processes ujk, Uk and yk on the interval 
< fc < M . Expectation with respect to the as- 
sociated probability distribution is denoted Efjj^ q. 
The state sequence ujk is a controlled Markov pro- 
cess. 

One way a controller K can be constructed is 
using a function 

Uk = u{ujk,k) 

where Uk is given by © with initial state u)q. This 
controller is denoted K^^ . The SME equation ((S)) 
forms part of this controller, viz. its dynamics, 
and must be implemented with suitable technol- 
ogy (e.g. digital computer). Controllers of this 
type are said to have a separation structure, where 
the controller can be decomposed into an estima- 
tion part (i.e. filtering via (|HJl) and a control part 
(i.e. the function u). We will see in section Hill 
that the optimal risk-neutral controller is of this 
form (Figure inj. In section Hvl the optimal risk- 
sensitive controller also has a separation structure, 
but the filter used is diflFerent (FigureE)) . The sepa- 
ration structure arises naturally from the dynamic 
programming techniques, as we shall sec. 

Example II. 3 (Two-state system with feedback, 
Example III . 21 continued . 'l We consider a particular 
feedback controller K for a time horizon Af = 2 
defined by 



uo= Kn = 0, ui = Ki{yi) 



if yi = 1 

1 ifyi = -i. 

(16) 

We apply K to the system with initial pure state 



or ujq 



1 1 
1 1 



(17) 



6 



UJO 


pi LOl 


P2 UJ2 


Uo 


1 ,,(0,-1) 

2 ^1 

1 , ,(0,1) 

2 ^1 


2 , /I \2 (0.1), (0,1) 

Of + (1 — a) ' ' ' 


po = i^O 


Pl 


P2 



TABLE I: State evolution under the controller K. 

The result is shown in Table ITOl which displays 
the resulting conditional states 



,("o,!/l),(ui,J/2) _ 



and the associated probabilities. Explicitly, the 
terms shown in Table IIT31 are: 

Pi = Piyi\uo,uJo), P2 =p(2/2|wi,wi) 



,(0,-1) 



\ (0,1) 
a ' ' 



a 
(l-a) 



2 V 1 



(0,-1), (1,-1) _ , (0,1), (0,-1) _ 1 / 1 



(0,-1), (1,1) _ , ,(0,1), (0,1) 



Wo 



1 



a 







a2 + (1 - a)2 V (1 - a)2 
Also shown are the non-selective states: 
Po = ^0 

1/10' 



2 10 1 



Ul 



= 1. This results in the states 



(0,-1), (1,-1) 



or 



,(0,-1), (1,1) 



, depending on the outcome of the sec- 
ond measurement 2/2- If, on the other hand, j/i = 1 
is observed, the system moves to the state uj^i'^\ 
Since yi ~ 1, the controller K (|16|l gives ui — 0, 

again depend- 



(0,1), (0, 



or CO. 



(0,1), (0,1) 



and hence oj 

ing on the outcome of the second measurement 1/2 • 
This is illustrated in Figure 0] 





















M-a 
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M-Q 
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flip 








-1 










-i 





^2 P2 



FIG. 4: Physical realization of the two stages of the 
two-state system with feedback using controller K. 
Due to the merging of the beams in the second stage, 



we have the intermediate state UI2 = 



1 (0,-1), (1,-1) 



1, ,(0,1).(0,- 
2'^2 

or (1)2 



■1) 



if 2/2 = —1 (with probability 2q(1 



+ 



1, ,(0.- 
2^2 



-i),(i,i) 



+ 



1, ,(0,1), (0,1) 
2^2 



if J/2 = 1 (with 



probability a + {1 — a) ). 



These results are consistent with |2a, Section 
1.3]. Indeed, when a = (perfect measurements), 
the feedback system terminates in the desired pure 
state p2 = |1)(1|- The role of feedback control is 
clearly demonstrated here. With imperfect mea- 
surements, < a < 1, the system terminates in 
the mixed state p2 given by 1)18(1 . with the degree 
of mixing (indicating the expected degradation in 
performance) depending on the measurement error 
probability parameter a: 

tipl = {a^ + a(l - a)f + (a(l - a) + (1 - afY 
< 1 if < a < 1 
= 1 if a = 0. 



□ 



III. RISK-NEUTRAL CONTROL 



P2=p{-i\i,u^r >r ' 

+p(l|l,a;^-^V^°-^)^(^^^) 

-Hp(l|0,a;^i))4''^^)'(°'i) 
_i(a'^+a{l~a) \ 

~ 2 a{l - a) + [l - af ) ■ 

(18) 

At time fc = the control it = is applied. 
If 2/1 = — I is observed, as a result of the imper- 
fect measurement, the system moves to the state 
iof' ^\ Since yi = —1, the controller K p6|) gives 



In this section we summarize dynamic program- 
ming results for a well-known type of finite time 
horizon optimal control problem, 0, . The opti- 
mal control problem discussed here can be consid- 
ered to be a prototype problem illustrating mea- 
surement feedback in the quantum context. The 
dynamic programming methods used in this paper 
for solving the optimal control problems are stan- 
dard, and the reader is referred to the literature 
for further information, see, e.g. [lLl^.l23|. 

We define a cost function to be a non-negative 
observable Liu) that can depend on the control u. 



7 



The cost function encodes the designer's control 
objective. We also use a non-negative observable 
N to define a cost for the final state. 

Example III.l (Two-state system with feedback, 
Example 111.31 continued. ) To set up the cost func- 
tion L{u) to reflect our objective of regulating the 
system to the desired pure state we define 



X 



iA-l.I) = 



-1 




where A is the observable corresponding to the 
projective measurement H14(l . Wc note that the 
expected value of is 



(11X^11) 



tr[X2|l)(l|] 







(- 11X^1 -1) =tr[X2|-l)(-l|] =1 

which gives zero cost to the desired state, and 
nonzero cost to the undesired state. We shall also 
introduce a cost of control action, as follows: 



c{u) 



if u = 
p if u = 1 



where p > 0. This gives zero cost for doing noth- 
ing, and a nonzero cost for the fiip operation. Thus 
we define the cost function to be 

L{u) ^ X^ + c{u)I (19) 

and the cost for the final state is defined to be 

N = X^. 

This modifies our earlier objective of putting the 
system into the desired state by including a penalty 
for control action. □ 

Let M > be a positive integer indicating a 
finite time interval k = 0, . . . , M. Given a se- 
quence of control values uq^m-i = uq, . . . , um-i 
and measurements yi.M = ■ • ■ ,yM, define the 
risk-neutral cost functional 



M-l 



\uj,,L{u,)) + {ujM,N)l (20) 



where w^, i = 0, . . . , il/ is the solution of the system 
dynamics (jH)) with initial state loq ~ to under the 
action of a controller K. This is an appropriate 
quantum generalization of the classical LQG cost 
(Q. The objective is to minimize this functional 
over all measurement feedback controllers K G JC. 

Following Q it is convenient to rewrite the cost 
functional (|20|) . For each k, given a sequence of 



control values Uk,M-i ~ u^, ■ ■ ■ ,um-i and mea- 
surements Uk+iM = 2/fc+i, . . . , j/M, define a ran- 
dom sequence of obscrvablcs by the recursion 
(H equation (3.1)]) 

Qk = T^Uk,yk+l)Qk+l+L{uk), < fc < Af- 1 
Qm 

(21) 

When useful, we write 

Qk = Qk{uk.M-l,yk+l,M) 

to indicate dependence on the input and outputs. 
Qk may be called a cost observable. The cost func- 
tional H2(J|I is given by 

= X! {^^Qo{K{yi,M)o,M-i,yi,M)) (22) 

Here and elsewhere we use abbreviations of the 
form 

K{yi,M)o,M-i = {Kq, Ki{yi), . . . ,KM-i{yi,M-i)) 

Remark III. 2 The cost observable Qk given by 
(|21|) and the expression in (|22|l is analogous to 
the familiar Heiscnberg picture used in quantum 
physics. It is very natural from the point of view 
of dynamic programming, and indeed (|20|) and (|22|l 
are related by iterating (|21|l . Here is the first step: 

(t^o, Qa) = (t^o, rt(uo, yi)Qi + L{ua)) 

= {ujQ, L{uo)) + (r(uo, yi)wo, Qi) 
= {ujQ,L{uo)) + {uji,Qi)p{yi\ua,u;o) 

where lui = Ar{uQ,yi)ujQ and p(yi|ito, wq) is given 
by ®. □ 

The key idea of dynamic programming is to look 
at the current state at a current time < k < 
M — 1 and to optimize the remaining cost from 
the current time to the final time. This leads to an 
iterative solution. Accordingly, we define, for each 
< k < M, the cost to go incurred by a controller 
K (restricted to < ^ < M - 1) to be 



Vk+i.Mey^'-'' 



A, Qk{K{llk+l,M)k,M-l,yk+l,M)) 

(23) 

The dynamic programming equation associated 
with this risk-neutral problem is 

VAA) = inf {(w,i(it)> 

+ V V{Kt{u, y)w, k + \)p{y\u,u)}, 



VAAI) = A^N) 



(24) 
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where < k < M — 1. This is the fundamental 
equation from which optimahty or otherwise of a 
controller can be determined. 

Let V be the solution to the dynamic program- 
ming equation (|24|l . Then for any controller A' G /C 
we have 

< J^,fc(A'). (25) 
If we assume in addition that a minimizer 
u* ( w , fc ) G argminj {uj, L{u)) 

+ V{Ariu, y)u, k + l)p{y\u, lo))} 

_ (26) 
exists [33 for allw,0<fc<M— 1, then the sepa- 
ration structure controller if"^ (recall section lTi fj|| 
defined by is optimal, i.e. 

^c.o,o(A-":) = V^(wo, 0) < J^,AK) (27) 
for aU K elC. 

Example III. 3 (Two-state system with feedback, 
Example IIII.ll continued.') We solve the dynamic 
programming equation H24|l and determine the op- 
timal feedback controls as follows. For k = M = 2 
we have 

y(w,2) = {uj,X^) = ujii 

and hence for fc = 1 

V{uj, 1) = uii + min[Vb(w, 1), Vi{uj, 1)] 

where where Vo(a;, 1), Vi(a;, 1) are given in Ap- 
pendix^ Hence we obtain 



u*(c.,l) 



if T/o(^,l) < Vi(u;,l) 



1 \iVo{LoA)>Vi{uj,l). 

At time fc = we have 

F(w,0) = wii +min[Vo(w,0),yi(w,0)] 

where Vb(ti', 0), yi(tj, 0) are given in Appendix IXI 
which gives 



if Vo(t^,0) < Vi(a;,0) 

1 if Vb(w,0) > Vi(w,0). 



filter 



control 



,(0,1) 



VI = 1 



M-a 



u*(c^i,l) 



Ul 



-1 



Vi 



T 








1 


til 



,(0,-1) 



M- 



-li 



W2 P2 

physical system 



u*(c^i,l) 



filter 



control 



The optimal risk-neutral feedback controller is 
given by 

«o = h^^o - u*(^o,0), u, = KZl,{yi) ^ 1) 

where ivi = Ar(uo, j/i)wo- Note that the control 
Ul depends on yi through the conditional state u)i 
(separation structure) . A physical implementation 



FIG. 5; Physical realization of the two stages of the 
two-state system with feedback using the optimal risk- 
neutral controller if (with ujq given by 1171 , we have 
uo = u*(aJo, 0) =0, Ul = u*(aji, 1)). 



of the quantum system with optimal risk-neutral 
feedback is shown in Figure 

Let's consider the special case a = and p = 

0, with initial state (|17|l . We then find that 
Vb(wo,0) = Vi(a;o,0) = 0.5, and hence we take 
u*(a;o,0) = 0; i.e. wq = 0. 

Next, if yi = —1 is observed, we have cji | — 
1)(-1|, Vb(wi,l) = 1 andyi(wi,l) = 0. Hence we 
take VL*(uji, 1) — 1, i.e. ui = 1. However, if yi = 1 
is observed, we have uoi = Vo{loi, 1) = and 

Vi(a;i,l) = 1; and hence we take u*(a;i,l) = 0, 

1. e. Ml = 0. In either case we achieve the desired 
state /C2 = ^^2 = 

This action is the same as that seen before for 
the controller K. The same controller is obtained 
for < a < 0.5 and p = 0, but 102 will be a mixed 
state. If p 7^ the optimal controller K^^ will 
result in control actions that in general differ from 
those oi K . □ 

Remark III. 4 Note that the optimal risk-neutral 
controller K^^ determined by H26() feeds back the 
conditional state Wfc, given by the SME ©, in ac- 
cordance with its separation structure. Figure |3 
We note that the conditional state ujk obtained 
from ((HJ provides the optimal means for calculat- 
ing estimates of observables (in the sense of mini- 
mum mean square error), viz. {ujk, B), and can be 
regarded as the optimal filter in this sense. This 
means that from the point of view of optimal risk- 
neutral control, the best thing to do is to make use 
of the optimal filter ||SJ), a dynamical quantity that 
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contains knowledge of the quantum system, as ob- 
tained by the controUer through the measurement 
process embedded in F. □ 



input 



physical system 

state LUk 
eqn. (jHJl 



output 



y 



control 



filter 
state ujk 
eqn. © 



feedback controller iC"^ 



FIG. 6: Optimal risk-neutral controller K^^ showing 
separation structure and states of the physical system 
LJk and filter uik- 



Remark III. 5 A second remark we wish to make 
here concerns the well-known concept in control 
engineering of dual control, Chapter 6.8]. This 
concept relates to the dual function of the mea- 
surement feedback controller K, viz. (i) to alter 
the future evolution of the system, and (ii) to alter 
the future values of the available information. The 
optimal choice of K takes both of these factors into 
account. □ 

Remark III. 6 The final remark for this section 
concerns feedback and robustness. Feedback is the 
most important concept in control engineering, 
and has a long history going back at least to the 
mechanical governors used to regulate the speed of 
steam engines. Feedback is used to compensate for 
disturbances and uncertainty, and feedback loops 
typically enjoy a robustness margin (e.g. gain 
margin and phase margin in classical control en- 
gineering), a measure of this compensation ability. 
Note that in the absence of disturbances and un- 
certainty, feedback is completely unnecessary and 
control can be achieved by a prescribed open loop 
controller. However, in reality both quantum and 
classical systems are subject to disturbances and 
uncertainty, e.g. (i) the influence of an environ- 
ment, (ii) model error due to approximation and 
unknown parameters, and (iii) imprecise measure- 
ments. In the quantum context, there is the fur- 
ther complication that as a consequence of the 
act of measurement randomness is introduced, and 
this could potentially reduce control effectiveness. 
Measurement feedback control of quantum systems 



is fundamentally a stochastic control problem con- 
taining non-classical characteristics. These con- 
siderations underscore the importance of feedback 
control and the need for robustness when control- 
ling quantum systems. Robustness issues will be 
taken up again in section IIVI (see Example IIV.8|I . 
□ 



IV. MULTIPLICATIVE COSTS AND 
RISK-SENSITIVE CONTROL 

We turn now to the risk-sensitive optimal control 
problem, the main object of this paper. The risk- 
sensitive cost functional we consider, a quantum 
generalization of LEQG 10), is 

fc=0 

(28) 

where > is a positive risk parameter, L{u) is a 
cost function (as defined in section IITT|l . and N is 
a non-negative observable. The conditional states 
iOk are given by the quantum system model ((SJ. 

Remark IV. 1 The risk-sensitive cost H28f) . by use 

of the exponential function, gives heavy weight to 
large values of the cost functions in the exponents. 
A system controlled by a controller minimizing this 
cost is not likely to experience large values of these 
quantities. Risk-sensitive controllers are known to 
enjoy some robustness properties against uncer- 
tainty in the model and external disturbances, see 
[H and Example ITOl □ 

The primary goal in this section is to find the 
optimal controller for the risk-sensitive cost func- 
tional H28|l . As noted, this cost functional is de- 
fined in terms of the conditional state ojk of the 
quantum system ©. However, in order to solve 
this optimization problem, we need to express the 
cost functional in a manner that facilitates the use 
of optimal control methods. As in the classical 
LEQG case, this requires the introduction of a new 
state, which in general is unnormalized. To define 
this new unnormalized state, for which we use the 
notation uj to distinguish such states from normal- 
ized states w, we need to use possibly nonlinear 
operators (observables) B and (super) operators 
R. These nonlinear operators allow us to formu- 
late and solve a general class of multiplicative cost 
optimal control problems for quantum svstems|33j. 

Our risk-sensitive and multiplicative cost func- 
tional can be defined in terms of (super) operator 
valued costs R{u) that satisfy the real multiplica- 
tive homogeneity property 



R{u)rili = rR{u)uj 



(29) 
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for any real number r and any lj, u. The risk- 
sensitive problem corresponds to particular choices 
of operator valued cost, Example I1V.2I However, 
the fundamental equations in this section are valid 
for any operator valued cost R{u) satisfying (|29|) . 
Note that operator valued costs R{u) are not in 
general quantum operations (because linearity and 
the inequality {R{u)lu, I) < (cD, /) need not hold in 
general) . 

Example IV. 2 We give two examples of operator 
valued costs. 

(i) A specific linear form for R{u) is 



R{u)(2i = Zc{u)(2!Zc{i 



(30) 



where {Zc{u)} is a family of cost functions (section 
mill . The adjoint R^{u) acts on observables B via 



i?t(u)S = Zaiu)BZc{u), 



(31) 



and thereby defines a linear functional on unnor- 
malized states by 

{lo,R\u)B) = J2 {Zc{u)u;Z,,iu),B) (32) 



(we have written this explicitly to facilitate com- 
parison with (|34|) below) . 

(ii) An operator valued cost R{u) corresponding 
to the risk-sensitive cost (|28ll can be defined as 
follows. Let L{u) be a cost function, and /i > 0. 
Then set 



i?(w)(2> 



(33) 



Note that R{u) is nonlinear, but satisfies the real 
multiplicative homogeneity condition (|29() . The 
adjoint operator i?t(u) applied to an operator B 
is a nonlinear functional of lu given by 



),R\u)B) 



(Co.B) 



(34) 



(cf. ^ above). 

The relationship between R{u) and the risk- 
sensitive cost H28(l will be explained in Example 
ITOl below. 

□ 

Given an operator valued cost R(u), we shall 
find it convenient to introduce an operator Tji{u^ y) 
defined by 



In general, Tr is not normalized: 

{rR{u,y)Lu,I) = {R{u)lo,I) ^ (uj). 

The operator Tr will be used to define a new state 
evolution as follows. Define an operator Ar._R by 



Ar,i?(w, y)w 



PR{y\u,uj) 



(36) 



where 



, , {TR{u,y)uj,I) 
PR{y\u,u)^—————. 37 

{R[u)ujJ) 

In general, the state KY^R{u^y)ili is unnormalized. 
However, pr(i)\u,Cj) is a probability distribution, 
since it is easy to check that 

^Pfl(?/|u,c2') = 1. 
yeY 

However, we point out that 

(Ar,i?(«,y)c^,/) = (i?(u)cD,/). (38) 

This unnormalized state transition operator arises 
in the dynamic programming equation, as we shall 
see below. 

Associated with the operator Ar,fl is the dynam- 



(39) 



where yk+i is distributed according to the prob- 
ability distribution pR{yk+i\uk,(l>k) given by H37|) . 
This is a controlled Markov chain, with unnormal- 
ized states LOk- It is a modified stochastic master 
equation corresponding to the operator Tr. Under 
the action of a controller K € JC the stochastic pro- 
cess cDfc is determined by and Uk = Kk{yi,k)- 
The separation structure controller in this case 
takes the following form. Given a function u(t2', k) 
and initial state ujn we define a controller G /C 

by 

Uk = u{LUk,k) 

where cDfc is given by (|39|l . < fc < M, with initial 
condition loq. 

Let M > be a positive integer indicating a fi- 
nite time interval fc = 0, . . . , M. For each fc, given a 
sequence of control values Uk,M-i = Ufc, • ■ • , Um-i 
and measurement values yk+i.M = yfc+i, ■ • ■ ,2/a/, 
define random cost observables Gk by the recur- 
sion 



ri?(u,2/) = T{u,y)R(u). 



(35) 



Gk = i?t(ufc)r^(ufc,?/fe+i)Gfc+i, 
Gm ~ F 



(40) 
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where < k < M —1 and is a non- negative linear 
observable. It is evident that Gk is real multiplica- 
tive homogeneous if Gk+i is (recall (ESJ). 

We next define the multiplicative cost functional 

Jcjfii^)= X! ('^>G'o(-F5r(yi,M)o,A/-i,yi,M)> 

(41) 

where K & IC is a, measurement feedback con- 
troller. 

Lemma IV. 3 The cost functional J^f^{K) de- 
fined by \41}) is given by the alternate expression 



(42) 



where uJi, i = k, . . . , M is the solution of the re- 
cursion 123) with initial state luq ~ lu under the 
action of the controller K . 

Proof. We have 

(wo,Go) = (t2)o,i?t(up)rt(it(),yi)G'i) 
= (i?(uo)a)o,r(Mo,2/i)^Gi) 
= (r(uo,2/i)i?(ito)a)o,Gi) 
= {loi, Gi))pR{yi\uQ,Lba) 

where a)i = Ar,fl(wo, yi)tio and p_R(yi|wo,(2'o) is 
given by l|?7|) . Iterating in this way we see that l|lT|l 
and are equivalent. These properties use the 
real multiplicative homogeneity property of Gk- D 



Example IV. 4 (Continuation of Example IIV.2I 
(ii).) We now show that when R(u) is given by 
and 

F = e^^, 

where A'^ is a non-negative linear observable, the 
multiplicative cost functional Jq; i^{K) defined by 
(|41|l equals the risk-sensitive cost functional (|28|l . 

Proceeding as in the proof of Lemma IIV.3I we 
have 



= (c2;o,i?^(^^o)rt(7.o,2/i)Gi) 
= (r(wo,?/i)i?(wo)'^o, Gi) 



/iL(uo)\ 



= (r(wo,2/i)wo,Gi, 

^ (r(Mo,?;i)a)o,Gi)/((2;o,l) ^L(no)\ 

(r(iio,yi)c2^o,l)/(^o,l) ^ °' ^' 
(r(Mo,yi)a)o, 1) 

(^0,1) 

= (Ar(uo, yOcI'o, Gi) (wq, e^'^("«))p(yi|ito, cDq) 

where ujq — loq/ {Qjq^I) , Ar{u,y) is defined by (j^ 
and p(y|w,a;) is defined by l^. Now if ujq ~ loq is 



normalized, with (wg, 1) = 1, then we have shown 
that 

(t^o, Go) 

= (Ar(uo, yi)wo, Gi) {loq, e^-^("°))p(yi|Mo, wq) 
= (c^i,Gi)(t^o,e^^("«))p(2/i|uo,^o) 

where wi — Ar(uoi yi)'-^o is the normalized state 
evolving according to the quantum system model 
©. Note that (uji,l) = 1. Continuing in this 
way we see that (|41|) equals the risk-sensitive cost 
functional using Lemma HV. 31 

It can also be checked that ujk and are related 
simply via 



k-l 

i=0 



(43) 



□ 



To solve the optimal control problem for the cost 
functional (|1T)) . we define the cost to go 

= ^ {<^,Gk{K{yk+l,M)k,M-l,yk+l,M)) 

(44) 

and the corresponding dynamic programming 
equation 

W{u;,k) = mi{yW{Ar,R{u,y)to,k + l) 



W{u;, M) = (cj, F) 
where < fc < A/ - 1. 



■PR{y\u,L^)}, 



(45) 



Theorem IV.5 Let W{Cj,k), < k < M, be 
the solution of the dynamic programming equation 
(i) Then for any K € IC we have 

W{u,k)<J^jK) (46) 

(ii) Assume in addition that the minimizer 

u*{u,k) 

e argmin{^ W{Ar^R{u,y)uj,k -\- l)pii{y\u,uj)} 



tieu 



yeY 



(47) 

exists for all lo, < k < M — I. Then the separa- 
tion structure controller K^^ defined by j/^T] ) is op- 

timal for problem gTp, z.e. J^:„,o(^) > J^oi^X ) 
for all K e IC. 

Proof. We prove part (i) by induction. Let 
K e IC. For k = M, we have 

W{u:,M) = {u,F)=Jl,,{K) 
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so holds for k = M. Next, we assume 
holds for fc + 1, i.e. 

W{u:,k + \)<J^,^^,{K) (48) 
Now by 63, gHJ and P7|l 

^ X! W^(Ar,fl(ufc,2/fc+i)t^,fc + l)PflXyfc+i|'"fc,'^) 

= X! X] (Ar,i?,(wfe,2;fc+i)'^,G'fe+i). 

■PR{yk+i\uk,'^) 

= X! XI (w,r^(ufc,yfe+i)Gfc+i) 

y»=+ieYy,_,2_„eYJ'/-(fc+i) 
= JlkiK) 

as required. 

Part (ii) follows from the proof of part (i), with 
fc = 0, since at every step we have equality and so 

Hence J^;^ o(^) > JLo^^g) for aU K e JC. The 
real multiplicative homogeneity property of Gk has 
been used here also. □ 

Remark IV. 6 Note that the optimal multiplica- 
tive cost/risk-sensitive controller determined 
by (|47|l feeds back the unnormalized conditional 
state ujk, given by the modified SME Fig- 
ure [71 This means that from the point of view 
of optimal risk-sensitive or multiplicative control, 
the best thing to do involves use of a dynamical 
quantity that not only contains knowledge (as mea- 
sured by the controller) of the quantum system, 
but also contains information about the purpose 
of the controller. This should be contrasted with 
the risk-neutral case, section IIIII Note in par- 
ticular that the modified SME (|39|) is no longer 
the optimal filter from the point of view of seek- 
ing the best estimate of observables (c.f. Remark 
IIII.4II . Further, the concept of dual control (Re- 
mark I^^J has greater weight here, since the cost 
Riu) appears explicitly in the controller dynamics 
(|39|l — while the optimal multiplicative cost/risk- 
sensitive controller has a separation structure, in 
the sense of a decomposition into a dynamical filter 
part and static control part, the task of estimation 
is not separated from the task of control, Figure 
[3 We emphasize that the multiplicative cost/risk- 
sensitive conditional state is defined only in the 
context of these specific control objectives, where 
they are used in specific feedback situations. □ 



input 



physical system 
state Wfe 
eqn. (|SJ 



output 



y 



control 
u*(a)fe,fc) 



filter 
state Cjk 
eqn. ^ 



feedback controller K^, 

FIG. 7: Optimal multiplicative/risk-sensitive con- 
troller K'^^ showing separation structure and states of 
the physical system u)h and filter Cj^ ■ 



Example IV. 7 (Two-state system with feedback, 
Example IIII.3I continued.) We now consider the 
risk-sensitive optimal control problem for the two- 
state example, with operator valued cost R{u) de- 
fined by H33|) where L{u) is given by p9(l . If we 
write the density matrix as (|15|) . then 



bJll + UJ22 \ ^12 ^22 



The risk-sensitive controlled transfer operators 
Tfi{u,y) are defined by 

Tii{u,y)ui = T{u,y)R{u)u; 



Explicitly, we have 

e^LJii+uJ22 



^;;T^r(M, y)(i. 



rfl(o, ^i)u; = ^:""t"^^ ( " ° 

^ ' wil+t^22 V aLd22 

r^(0, l)u; - -^^11+^ I "^'^ii 







rB(l,-l)a) = e':"ii+'^22 



1^11+^22 \ (1 — a)LU22 

(1 - Q;)i2'22 

aa)ii 









' ' u;ii+W22 \ [1 - a)ujii 

The dynamic programming equation (|45|l is 
solved and the optimal feedback controls are found 
as follows. First, for fc = M = 2 we have 

WiCu, 2) = (cl;, e^^') = e^t^n + ^22 

and then for fc = 1 

VF(w, 1) = min[VFo(w, 1), Wo{u;, 1)] 

where Wo{Lij,l) and VFi(d),l) arc given in Ap- 
pendix ^ and 



u*(^,l) 



ifWQ{uj,l)<Wi{uj,l) 

1 if W^(t^,l) > VFi(cj,l). 
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Next, for fc = 0, 

WiLu,0) = mm[Woid},0),Wo{Lu,0)] 

where Wo{u},0) and W^i(tD, 0) are given in Ap- 
pendix^ and 

m - / O ifWoiu;,0)<W,iu,0) 
The optimal feedback controUer is given by 



uo 



Kl^o = fi*('^o,0), u, = KSlM = u*{Lu^, 1) 

where a)i = Ar,_R(ito, j/i)(2>o. Again, we see the 
separation structure, where here the control ui de- 
pends on yi through the unnormalized conditional 
state uji . A physical implementation of this con- 
troller is shown in Figure |S1 



fiher 



control 



^0 



,(0,1) 



yi = i 



Ul 



-1 



2/1 = -1 



Ul 



.(0,-1) 

LJn ^ Lul 



M-a 



-li 



W2 P2 

physical system 



u*(t2'i,l) 



filter 



control 



FIG. 8: Physical realization of the two stages of the 
two-state system with feedback using the optimal risk- 
sensitive controller K^^ (with ujq — ujq given by 11711 . 
we have wo = u*{u!o, 0) = 0, ui = u*(a)i, 1)). 



To see the effect of the risk-sensitive controller, 
consider the initial state cDq = ujq given by (|17() . 
and parameter values a = 0.25, fi — 2. We find 
that Wo{(l!o,0) — Wi{ujo,0), and hence we take 
Wo = u*(wo,0) = 0. 

If 2/1 = —1 is measured, we find that 

3.1459 
~ 1 1.04863 
0.75 



UJl 





0.25 



and 



u*(t2'i,l) = 



with prob. 0.5, 



if < p < 0.4 
if p > 0.4 



(49) 



If, on the other hand, yi = 1 is measured, we find 
that 



UJl 
UJl 



1.04863 

3.1459 
0.25 



0.75 



with prob. 0.5, 



and 



Ul = u*(tDi, 1) = 



for any value of p. When the control cost p = 0.2, 
the final (non-selective) state is given by 

^2=1 T 0.75) = 0.251 -1)(-1|+G.75|1)(1| 

This state docs not equal the desired pure state 
|1)(1|, a reflection of the level of measurement un- 
certainty a = 0.25 and the presence of a non-zero 
control penalty. 

To compare with the risk-neutral version of this 
problem, we find that the threshold in H49|) for the 
risk-neutral problem is p = 0.75. This means that 
for a larger range of values of the control cost p, 
the risk-neutral controller will be active, i.e. se- 
lect M = 1 than is the case for the risk-sensitive 
controller. This is consistent with the description 
of the example in |^, 0| where the risk-neutral 
controller is more aggressive than the risk-sensitive 
controller. □ 

We conclude with an example which indicates 
the likely robustness properties of the risk-sensitive 
controller and the relationship between the risk- 
neutral and risk-sensitive problems. 

Example IV. 8 We consider the risk-sensitive 
cost functional H28|l . where where the operator val- 
ued cost R{u) has the form H33|) . and F = e^^. 

Robustness. To describe the robustness prop- 
erties of the risk-sensitive controller, we follow [l5j 
and make use of the following general convex du- 
ality formula (see, e.g. 0, Chapter 1.4]): 

logEp[e^] = sup{Eq[/] - REiQ \\ P)} (50) 
Q 

where P and Q are probability distributions |34j . 
and where the relative entropy is defined by (see, 
e.g., m Chapter 11]) 

i?i5(Q||P)=EQ[logg]. 

To apply formula 150(1 . we proceed as follows. Let 
^norn bc thc nominal operator used for design of 
the optimal risk-sensitive controller, here denoted 
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K*^^. Together, r„om and if*o,„ determine a 
probability distribution, here denoted P„om- In 
reahty, the nominal Tnom need not equal the op- 
erator for the "true" system, denoted Ttme- The 
controller if applied to the true system, re- 

sulting in a probability distribution Ptr«eE3- 

We write fi = 1/7^, and apply l(Sn|) to obtain the 
following inequality (P = P„o,„, Q = Ptrue)- 



(Ptrue 7^ Pnom), as implied by the bound. In sum- 
mary, risk-sensitive controllers enjoy enhanced ro- 
bustness (recall Remark llll.6|) . 

Relationship between the risk-neutral and 
risk-sensitive value functions. We indicate 
briefly how the results of 0, Theorem 5.5] ap- 
ply in the present context. Indeed, the reader may 
check that for small /i > one has 



72 logEp,^ 

> Ep_[72log(nfeLo' (^fe,e^^("'=))(u.M,e'^^»] 

= Ep,_[7'(EfeLo'log(c.fe,e'^^("^)) 
+ log(a;M,e'^^))] 

-J^RE{Ptrue II Pnom) 
RE {Ptrue II Pnom) 

This implies the important bound: 



1 (c^,exp(MjV)) {u,N) 
M (i^, 1) (w,l) 

This suggests the relation 

1 W{u;,k) V{tb,k) 
lim - log = , 52 

^Llo ^ {uj, 1) (w, 1) 

which says that a logarithmic risk-sensitive opti- 
mal cost tends to the optimal risk-neutral cost as 
the parameter /i ^ 0, as might be expected. □ 



<7^iogj;:^f 



(51) 

The LHS of H51|l is the risk-neutral cost criterion 
(Pn|l , evaluated using the true system model Ptrue 
and the controller K*^,^ designed using the nomi- 
nal model P„om- Inequality (|51|l bounds this cost 
by two terms, the first term is related to the op- 
timal risk-sensitive cost (|28|) . while the second is 
the relative entropy term, which is a measure of 
the "distance" between the true and nominal sys- 
tems. The number 7^ = 1//^ > is a "robust- 
ness gain" parameter, which we would like to be 
as small as possible for maximum robustness, as 
in robust control, [l^, where the relative en- 
tropy term is a measure of the "energy" in the 
disturbance or uncertainty. This shows that the 
risk-sensitive controller enjoys good performance, 
as measured by the risk-neutral criterion, under 
nominal conditions (Ptrue = Pnom), and accept- 
able performance in other than nominal conditions 



APPENDIX A: FORMULAS FOR THE 
TWO-STATE SYSTEM WITH FEEDBACK 
EXAMPLE 

The following quantities were used in the solu- 
tion of the risk-neutral problem. Example IIII.3I 

^0(01, 1) = tJii, Vi{uj,l) ^ UJ22 + P 



Vo{uj,0) = iiJii + mm[aLUii,p + CJ22 - 0^22] 
+ min[tJii — auj II, p + aLU22] 



Vi (tJ, 0) = p + aivii + UJ22 - 0HJJ22 

+ min[aa;ii , p 4- UJ21 — au;22] 
+ min[p + auJii,uj22 — aw22] 

The following quantities were used in the solu- 
tion of the risk-sensitive problem, Example IIV.7I 



J 



Wo(w,l) 



(e^a)ii +0)22) 

LUll + UI22 



^^1(^^,1) 



e'^P (wii W22 + e^^ (I'll 0)22 + e'' (a)ii + CjI^)) 



Wii -f 0^22 



1^0(^1', 0) =min[-(^ 
-( 



^11+^22) ((— 1 + CJll— QL(>22) 

(Liii+[i22) ((— l + Ql2'22) 

(e'-' LJ11+LD22) (-((-l+g) ai2iii cD22)-(- 



1+a) a 



iin([ 

" (e*" 



(lill-|-LJ22) + iill— Q LD22) 

' ^11+12122) (g e'' LOii+uj22—a^ll) 



(CJ11+CJ22) (a (wii— l2'22)+li'22) 
t2lii+l2)22) (e^ 1^22+" "22 (i^ll+e 



uJil-2 e'^ uJ22)+a {-{uiii CJ22)-e ^ aj22+e^ (1^11+12)22))) 1 



(lill+W22) (a (till-li22)+1^22) 
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Wi{6j,Q) =min[-( 



(U)rll+l2l22) (( — 1 + 1^11— cni22) 

. e^"P (e" LDii+i:j22) (-((-l+g) aLDii LD22)-(-l+a) a ^' lDh 1^22+e^ ((-l+ct)^ l^ii+q^ cD^2)) M 

(cill +11122) (( — cDii— a LD22) 
^^^r e'*'' (e'' LJii+^22) (e^ 1^22+a (cDn-e^ 

I- (1^11+12122) (a (cDll —ii22)+l2'22) ' 

e^^P (e^* lDii+i2.22) (e^ ^l^+aQ^-l (i^ii+e^^* LDii-2e^' ^22)+^^ (-("11 i:'22)-e^ " t^^ll t:'22+e^ {Cj^^+C^^^))) 

(lI)ll+tD22) (a (l^ll— 'i'22)+'2'22) J 
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